Initial Setup Properties |
||
Common (Voice and Data) |
||
|
||
Voice Only |
||
|
||
Data Only |
||
|
|
|
Runtime Properties |
||
Common (Voice and Data) |
||
|
||
Voice Only |
||
|
|
|
Data Only |
||
|
|
|
Greetings |
||
Voice Only |
||
|
|
|
Data Only |
||
|
||
Methods |
||
Common (Voice and Data) |
||
|
||
Events |
||
Common (Voice and Data) |
||
|
|
Overview
The VoiceRec control plays a greeting and then waits for an utterance from the caller, from which it attempts to recognize words or phrases using the selected grammar.
Nuance and MRCP engines
If the words fit the grammar the call exits via the EndRec node and the results are stored in Words property. There is no confirmation of the results. If a digit appears during recognition the call exits via Dtmf node and the digit value is stored in the Words property.
See the example NuanceSpeechRec.
Grammars
The grammar file is created by a grammar compiler specific to the speech recognition engine in use. The grammar file defines one or more different grammars. Each grammar consists of a list of words that can be recognized, and optionally the context in which they are recognized.
For more details on grammar files and speech recognition in general, see About Speech Recognition The DBAuth and GrammarName properties can contain control references.
The grammar file and grammar name can now contain property references such as %User1.Value%.
Barge-In
Barge-in is the ability of the system to detect when the caller is speaking and terminate a greeting that is playing, and simultaneously start recognizing the speech. This function requires echo-cancellation which imposes particular hardware requirements on your voice card. Contact the Pronexus sales department to discuss any hardware compatibility questions.
DTMF Reception
If a DTMF digit is received during the EntryGreeting or during recognition, the call will exit out of the Dtmf node. This allows the caller to revert to using DTMF digits if required.
Invalid Digits, No Digits Handling
If no words are heard, or if the recognition fails, the control allows the user to try again up to a preset number of attempts. If the error count exceeds the number of retries set when invalid digits or a no digits event occurs, the VoiceRec control will perform error handling as described below.
If the error count has not been exceeded, and an invalid word or digit sequence has been received, VBVoice will play the InvalidGreeting, if set, followed by the EntryGreeting, and increments the error count. The default Invalid Digits Greeting is that was not a valid entry. If no words or digits have been received, VBVoice will play the SilenceGreeting, followed by the EntryGreeting, and increments the error count. The default SilenceGreeting is empty.
Nuance and MRCP
If the recognition fails due to no speech and the retry count is exceeded, the call will be transferred to the control connected to No Digits output, if this has been enabled using the Use default error handler checkbox. If the recognition fails due to other reasons, the call will be transferred to the control connected to Invalid Digits output, if this has been enabled using the Use default error handler checkbox.
If the Invalid Digits and No Digits nodes have not been set, it will attempt to invoke one of the default error handlers .
Err Input
The VoiceRec control can also be entered from a second input Err. This input causes the control to act as if an invalid word has been received: it increments the error count, and if it exceeds the maximum retry count, the call is transferred to the Invalid exit node. This input can be used when words have been collected and are then checked by code or against a database. If the words are incorrect, they can be treated identically to words which did match any of the conditions in the control.
VoiceRec Control Example
Use of this control is shown in the example NuanceSpeechRec.
Licensing
Voice recognition is licensed by the number of channels performing recognition concurrently.
For both MRCP and Nuance engines, the initial number of available licenses (engines to be created) is the smaller of these values:
- The maximum number of channels defined by the type of license purchased
- The INI setting NumberOfEngines in ASR section (default 96)
- The total number of channels on all the installed boards
For Nuance, you also have to take into account:
- The number of Nuance licenses available when the system is started
Nuance and MRCP
If the INI setting AllocEnginePerCall is set to 1, the number of available licenses is decreased by one when a call starts; otherwise, this happens only when the call enters the VoiceRec control.
The number of available licenses is increased again only when the control releases the engine (see IReleaseEngineOnExit, ReleaseEngineOnExit properties) or when the call is terminated. If the engine is released, all subsequent VoiceRec controls in the same call will not be allowed to grab a new engine and a NoLicenseAvailable event will be fired.
Nuance specific
- Dialogic Springware cards and Dialogic DM3 cards using cardtype = Dialogic
The current Nuance DM3 audio provider exhibits problems sharing Nuance ports between a larger number of VBVoice channels. This occurs when using the CSP enabled Dialogic Springware (JCT series) of hardware. When using this DM3 audio provider, you must have a 1:1 ratio of Nuance engines to VBVoice channels. You must also incorporate the following vbvoice.ini setting to disable this problematic dynamic resource allocation:
[Nuance]
DynamicAlloc=0: Each VBVoice channel will have a corresponding Nuance engine attached (no floating of engines will be available).
Alternatives to using this Nuance DM3 audio provider are available. To disable the use of this provider, incorporate the vbvoice.ini setting below. Note that this alternative provider allows the floating of engines between VBVoice channels, however each call will experience a small gap of silence (~1-2 seconds) when the first voice recognition is required. This delay is apparent on the first recognition of the call only, and every subsequent recognition during that call will not experience this delay.>
In general, depending on the particular voice card hardware, it may be necessary to obtain a 1:1 mapping of Nuance engines to VBVoice telephony lines. Contact Pronexus sales to discuss any possible compatibility issues.
[Nuance]
NuanceCSP=0: VBVoice will not use the default Nuance DM3 audio provider
DynamicAlloc=0: (No longer necessary when not using the DM3 audio provider.)
- Dialogic DM3 and Dialogic HMP
For card types Intel and IntelHMP, a Pronexus Audio Provider is used; all voice resources must be CSP enabled.
Initial Setup Properties
BeepBeforeSpeech
(Boolean)
If set, the control will play a beep before each recognition attempt. The beep is generated from the BEEP.WAV file and can be modified as required.
ClearDigits
(Boolean)
If set to True, the digit buffer is cleared when a call enters the control. This property can be set in the Terminations page.
DisableHelp
(Boolean)
Set to TRUE to disable the help digit handler. If not set (default), then if a help digit is detected (as defined in the LineGroup control), the call transfers to either the control set in the Connections property page or the LineGroup help digit output. See Help Digit. This property can be set in the Terminations page.
DisconnectControl
(String)
See Responding to Caller Hangup.
GlobalToneControl
(String)
See Global Tone Handling.
GrammarName
(String)
This property contains the name of the grammar to be used by the engine. If no grammar name is specified, the Nuance grammar to be used can be set at runtime using the NuanceGrammar property. If no Nuance grammar is specified (using either GrammarName or NuanceGrammar) a voice error will occur at runtime. A Nuance context could be used as a grammar name.
This property will accept control property references. See also GrammarFile above.
GrammarFile
(String)
This property contains the name of the grammar text file. It is only used at design time to pull in and validate the voice commands. At runtime, this property is ignored.
(Not supported by MRCP)
DBAuth
(String)
This property contains the username and password needed to connect to a relational database. The string should be in the username:password format.
(Not supported by MRCP)
DBFormat
(String)
This property contains a string identifying the data types for the database provider. Your database provider should support variable length binary data that can be fetched and written piece by piece. For example, Microsoft SQL Server 7.0 supports IMAGE data types. If you do not specify this option, the data type LONG RAW is used by default, which may not be the right data type for your database provider.
(Not supported by MRCP)
DBName
(String)
This property contains the name of the database, either file system based or Oracle database.
(Not supported by MRCP)
DBProvider
(String)
This property contains a string identifying the database provider. The only supported values are fs for file system and oci for Oracle.
(Not supported by MRCP)
DBRoot
(String)
This property contains a string identifying the database root directory. Used for file system only.
(Not supported by MRCP)
DBServer
(String)
This property contains a string identifying the database alias used to connect to database via network. Used for Oracle only.
(Not supported by MRCP)
HelpDigitControl
See Help Digit.
IBargeIn
(Boolean)
When set to True, barge-in is enabled (i.e. the play is stopped when the caller starts speaking).
IDoNBest
(Boolean)
(Supported by Nuance only)
When set to True, N-Best processing is enabled. The N-Best recognition processing method generates a list of possible recognition results, ranked from highest to lowest likelihood, instead of generating only the best single solution.
IMaxSil
Integer
The maximum amount of silence before recognition is terminated.
IMaxKeys
Integer
Not used
IMaxTime
Integer
The maximum duration (in seconds) of an utterance that may be accepted for recognition as one sentence.
INumNBest
Integer
Controls how many N-Best results can be generated.
InvalidErrorControl
String
See Invalid Digit, No Digits, and Silence Timeout.
IRecordDirectory
String
(Supported by Nuance only)
The directory where the recognized utterances are to be saved.
IRecordFilename
String
(Supported by Nuance only)
The name of the file containing the recognized utterances.
IRecordUtterance
Boolean
(Supported by Nuance only)
Enables/disables the recording of recognized utterances.
IReleaseEngineOnExit
(Boolean)
When set to True, the recognition engine is released and becomes available for another VoiceRec control on another channel or for a new call. If the engine is released, all subsequent voice recognition requests during the current call will fail; it is not allowed to use a VoiceRec control for a call which has released the recognition engine.
MaskLogDigit
Boolean
Sets the collected digits to not visible in the VBVLog; a Protected is logged instead. This feature is intended to be used when collecting security sensitive information such as passwords, bank account numbers, etc.
MaxRetries
Integer
The maximum number of retries before the error handler (No digits or Invalid digit) is invoked. See also RetryOnSilence. This property can be set in the Terminations property page.
NoDigitsErrorControl
See Invalid Digit, No Digits and Silence Timeout.
RetryOnSilence
(Boolean)
If set to True (default), then normal silence handling will operate. If set to False, then the error handler for silence will be invoked after the first detection of silence, regardless of the setting for Max Number of Retries. This property can be set in the Terminations property page.
See Invalid digits, No digits handling.
ITermDtmf
(Integer)
The digit which will be used to allow exit of the control while maintaining position in the queue.
UseDefaultError
(Boolean)
If UseDefaultError is set to False, two additional outputs are added to the control: Invalid and Silence. These outputs can be used to override the normal error handling for these conditions. When one of these conditions occurs, the call will move to the control connected to that output. This performs an equivalent function to the Invalid Digit and Silence handlers set in the Connections property page (NoDigitsErrorControl and InvalidErrorControl properties), but also provides visual representation on the form. This property can be set in the Terminations property page.
Runtime Properties
BargeIn
(Channel as Integer)Boolean
Set to True to enable barge-in or to False to disable it.
DoNBest
(Channel as Integer)
(Supported by Nuance only)
This property enables or disables N-Best processing at runtime.
GotoNode
(Integer)
This property will transfer a call to another control. See GotoNode.
Grammar
(Integer)
Set the grammar to be used for recognition. Used by MRCP and Nuance.
The NuanceGrammar property (see below) is still used for compatibility with the older versions of VBVoice.
MRCP specific: Grammar could be an URI or an inline grammar.
MaxKeys
(Channel as Integer) Integer
The maximum number of keys to receive before checking the buffer for a valid digit sequence. This value is set by the control to the default Max Keys set in the Terminations property page. It can be changed in the Enter event to another value if required. The new value only affects the current call until it leaves this control. Only applicable when recognizing digits.
MaxSil
(Channel as Integer) Integer
This is the maximum duration of silence (in seconds) while waiting for words before issuing a time-out. This property is set by the control to the value of Maximum Silence in the Terminations property page. It can be changed in the Enter event to another value if required. The new value will only affect the current call until it leaves this control. A value of 0 means that the control will not wait for digits after the greeting has finished playing, but will check the buffer immediately.
NuanceGrammar
(Channel as Integer)String
(Supported by Nuance only, obsolete)
Set the grammar to be used for recognition. The default value is the GrammarName property.
NumNBest
(Channel as Integer)Integer
This property controls how many N-Best results can be generated.
RecordDirectory
(Channel as Integer)String
(Supported by Nuance only)
This property sets the directory where the recognized utterances are to be saved.
RecordFilename
(Channel as Integer)String
(Supported by Nuance only)
This property sets the name of the file containing the recognized utterances.
To randomize the RecordFilename property, use %03d to set values from 001 to 999. For example, the filename msg%03d.wav sets the first RecordFilename to msg001.wav, the second filename to msg002.wav, and so on. Use of an asterisk (*) in the filename is not valid.
RecordUtterances
(Channel as Integer)Boolean
(Supported by Nuance only)
This property enables or disables the recording of recognized utterances.
ReleaseEngineOnExit
(Channel as Integer)Boolean
Set to True to release the recognition engine on exit from the control. It is recommended that the engine is released when leaving the last VoiceRec control required for the call, i.e. no more voice recognition is needed for the remainder of the call. For best performance the engine should be allocated for the duration of the call (AllocEnginePerCall=1 in vbvoice.ini, Nuance or MRCP section) and released when no more recognition is needed.
When set to False, the engine is released when the call ends.
Note that only one recognition engine may be allocated for any call. If the engine is released, all subsequent voice recognition requests during the current call will fail; a VoiceRec control cannot be used for a call once the recognition engine is released.
TermDtmf
(Channel as Integer)Boolean
Not used. Recognition is always terminated on any digit.
XMLRecResult
String
This property contains the recognition results in XML format, demonstrated below.
Nuance:
<?xml version='1.0' encoding='utf-8'?>
<result>
<interpretation conf="64">
<text>"withdraw five hundred dollars from my checking account"</text>
<slot>
<slotname>"amount"</slotname>
<slotvalue>"500"</slotvalue>
<slotconf>"66"</slotconf>
</slot>
<slot>
<slotname>"command-type"</slotname>
<slotvalue>"withdraw"</slotvalue>
<slotconf>"66"</slotconf>
</slot>
<slot>
<slotname>"source-account"</slotname>
<slotvalue>"checking"</slotvalue>
<slotconf>"66"</slotconf>
</slot>
</interpretation>
</result>
(Supported by Nuance only)
Words
(Channel as Integer) String
This property contains the words recognized by the control. The property contains each recognized word separated by spaces. The format of the recognition results is discussed below.
The recognition result follows after the ***RECOGNIZED: token, as a string delimited by quotes followed by the confidence score as (Conf= &). If there are several recognition results the tokens will be ***RECOGNIZED00:, ***RECOGNIZED01: and so on.
The natural language recognition result follows after the ***INTERPRETATION: token, as a string. If there are several recognition results, the tokens will be ***INTERPRETATION 00:, *** INTERPRETATION 01: and so on.
EXAMPLE |
***RECOGNIZED: "two seven one eight nine" (Conf=74 )***INTERPRETATION: {<digits (2 7 1 8 9)>} |
If the control exits via Dtmf node, Words will contain the digit which terminated the recognition.
(Slot-based confidence scoring)
The Nuance System is capable of generating confidence scores on a per-slot basis within a recognition result. This allows you to more closely analyze recognition results and handle any necessary error checking or re-prompting more naturally and efficiently. Slot-based confidence scoring lets you more closely identify the portions of a phrase that were likely to be accurately (or inaccurately) recognized.
To enable slot-base-confidence scoring, you must set rec.GenSlotConfidence=TRUE parameter in your RecServer initialization string.
EXAMPLE |
recserver -package d:\nuance\(...)\banking1 rec.GenSlotConfidence=TRUE rm.Addresses=localhost lm.Addresses= localhost |
Or you can add rec.GenSlotConfidence=TRUE to your Nuance-Resources site.
If you do not set rec.GenSlotConfidence=TRUE in either your Nuance-Resource.site file or RecServer process, no slot-based-confidence results will be generated.
Of course, you must design your grammars so that each slot is filled by a single subgrammar. Look at the banking1.grammar from the Nuance sample-package grammar for an example on how to design your grammars so that each slot is filled by a single subgrammar.
Once slot-based-confidence is enabled, a typical recognition result for "transfer five hundred dollars from my checking to my savings" may look like this:
***RECOGNIZED: "transfer five hundred dollars from my checking to my savings" (Conf=77 )***INTERPRETATION:{<amount 500> <command-type transfer> <destination-account savings> <source-account checking>}***SLOTCONFIDENCE:{<amount 79> <command-type 81> <destination-account 65> <source-account 79>}
In this particular case, the overall confidence is 77.
While the slot-based-confidence for the slot amount being "500" is a score of 79, the slot command-type being "transfer" is a score of 81, the slot destination-account being "savings" is a score of 65, and the slot source-account being "checking" has a score of 79.
If Do N-Best property is selected, the recognition result for "withdraw five hundred dollars from my checking account", may look like this:
***RECOGNIZED 0: "withdraw five hundred dollars from my checking account" (Conf=64 )***INTERPRETATION 0:{<amount 500> <command-type withdraw> <source-account checking>}***SLOTCONFIDENCE 0:{<amount 66> <command-type 73> <source-account 63>}***RECOGNIZED 1: "withdraw five hundred dollars to my checking account" (Conf=64 )***INTERPRETATION 1:{<amount 500> <command-type withdraw> <destination-account checking>}***SLOTCONFIDENCE 1:{<amount 66> <command-type 73> <destination-account 56>}
In MRCP, the recognition result is returned in the Words(channel) property in the format it is returned by the MRCP server. It should be parsed by the application.
Greetings
EntryGreeting
This greeting is played as the initial prompt to begin recognition. When barge-in is enabled, this greeting will be cut off as the caller begins to speak.
InvalidGreeting
(Unsupported)
This greeting is played if the word or phrase was found in the grammar, but there was no matching entry in any of the conditions. Normally this can be avoided by designing the conditions and the grammar to match. This feature is not supported anymore.
SilenceGreeting
This greeting is played if a response is not heard from the caller. After playing this greeting, the entry greeting is played, and waits again for a response, up to the error count.
UnrecognizedGreeting
This greeting is played after the voice recognition engine has heard a utterance, but was unable to match it to any of the words or phrases defined in the grammar. The greeting prompts the caller to try again.
Methods
AddGrammar
AddGrammar(Channel as Integer, Grammar as String, Type as vbvGrammarTypeConstants, GrammarID as String, Weight as Single) as Integer
Set the grammar to be used for recognition. Returns 0 if successful.
EXAMPLE
Dim grmType As vbvGrammarTypeConstants
grmType = vbvURI
Dim ret As Integer
ret = vrLanguage.AddGrammar(channel, "C:\mygrammar.grxml", grmType, "ID1, 1")
(Unsupported)
TakeCall
This method allows the programmer to override the graphical connections and transfer a call to any other control. See TakeCall.
Events
Disconnect
See Disconnect Event.
Enter
See Enter Event.
Exit
See Exit Event.
NoLicenseAvailable
NoLicenseAvailable(ByVal channel As Integer)
This event occurs if your Nuance license has been exceeded. Generally the call is terminated if this error is not processed. This event allows VB code to intercept the error and decide whether to allow the call to continue. When this error event occurs, the control generating the event is not generally able to continue normal processing. Your code can choose to ignore the error by redirecting the call to a new control by using the TakeCall method. If this action is not taken, the call is terminated.
PhraseError
See PhraseError Event.
PlayRequest
See PlayRequest Event.
VoiceError
See VoiceError Event.
VoiceRec Terminations Property Page (All Engines)
Disable global help digit
(DisableHelp property)
Set this check box to disable the help digit handler provided by the LineGroup control for this call. If this box is not checked, and the help digit specified by the LineGroup control is detected in the digits entered by the caller, the call will exit this control and enter the control connected to the LineGroup HelpDigit output. The help digit handler specified by the Connections property page will also be disabled by this field.
Use default error handler
(UseDefaultError property)
This check box is set by default. When the maximum retries for invalid digits or retries have been exceeded, the system will check for an error handling control or a connection on the LineGroup error output. If these conditions are not true, the ERROR.WAV file is played and the system hangs up. If this check box is not set, two new outputs appear on the control: Invalid and Timeout. These outputs can be connected to other controls to override the default error handler. See Global Events.
Retry on silence
(RetryOnSilence property)
If this box is checked (the default), both silence timeout and invalid digit events use the retry count. If this box is unchecked, then a silence timeout will cause the call to exit via the silence output immediately, regardless of the number of retries set. Invalid digits increment the retry count as usual. This is useful in initial menus where you want to give the caller several attempts to enter the correct digit, but want to allow callers without touch-tone phones be transferred to an operator without undue delay.
Clear digits on entry
(ClearDigits property)
Set this button if you want to clear all previously collected digits from the VBVoice digit buffer.
Termination Conditions
Max digits
(IMaxKeys property)
(Unsupported)
This field sets the maximum number of digits that can be received before VoiceRec terminates digit collection. This feature is not supported anymore.
Maximum silence
(IMaxSil property)
This field specifies the number of seconds that VoiceRec will wait for a word. If a word is not received in the time, recognition will be terminated.
Number of retries on error
(RetryOnSilence property)
This field specifies the number of invalid or unrecognized recognition attempts, or silence errors that can occur before VoiceRec passes the call to the NoDigits or Invalid nodes, or invokes the default error handler.
Always terminate on
(ITermDtmf property)
This field specifies a digit that can be used to terminate digit collection. This can speed up menu selection by allowing the caller to enter say the # key to end a variable length sequence of digits, rather than having to wait for a silence time-out.
Note: The grammar must be defined to accept this format.
Maximum time for speech
This field specifies the maximum time for which the control will listen to and analyze speech. After this time the control will stop listening and attempt to analyze the speech heard up to this point. The default, 0, means that there is no maximum time.
VoiceRec Grammar Property Page (Nuance)
Grammar To Load
(GrammarName property)
This field provides the name of the grammar to load from the Grammar File. If no name is set, the NuanceGrammar property must be set by code at runtime.
VoiceRec Setup Property Page
Barge-in
IBargeIn property
If checked, the prompts will be stopped when the speech start. This checkbox sets the IBargeIn parameter.
Release engine on exit
(ReleaseEngineOnExit property)
If checked, it will release Nuance recognition engine on exit from the control. It is recommended to release the engine when leaving the last VoiceRec control and no more voice recognition is needed for the rest of the call. For best performance the engine should be allocated for the duration of the call (AllocEnginePerCall=1 in vbvoice.ini, Nuance section) and released when no more recognition is needed.
Do N-Best
IDoNBest property
If checked, it enables the N-Best processing, i.e. the recognition engine generates a set of possible recognition results instead of only the best single solution. This capability is offered by the Nuance Systems N-best recognition processing method, which provides a list of possible recognition results, ranked from highest to lowest likelihood.
Record recognized utterances
IRecordUtterance property
If checked, it will enable the recording of recognized utterances.
Number of results
INumNBest property
If checked, sets the number of N-Best results to be generated.
Recording directory
IRecordDirectory property
This fields sets the directory where the recognized utterances are to be saved.
Recording filename
IRecordFilename property
This fields sets the name of the file containing the recognized utterances.